Estimating the Average of a Lipschitz-Continuous Function from 

One Sample 



Abhimanyu Das David Kempe* 

University of Southern California University of Southern Cahfornia 
abhimELnd@usc.edu dkempe@usc.edu 

January 21, 2011 



Abstract 

We study the problem of estimating the average of a Lipschitz continuous function / defined 
over a metric space, by querying / at only a single point. More specifically, we explore the role of 
randomness in drawing this sample. Our goal is to find a distribution minimizing the expected 
estimation error against an adversarially chosen Lipschitz continuous function. Our work falls 
into the broad class of estimating aggregate statistics of a function from a small number of 
carefully chosen samples. The general problem has a wide range of practical applications in 
areas as diverse as sensor networks, social sciences and numerical analysis. However, traditional 
work in numerical analysis has focused on asymptotic bounds, whereas we are interested in the 
best algorithm. For arbitrary discrete metric spaces of bounded doubling dimension, wc obtain 
a PTAS for this problem. In the special case when the points lie on a line, the running time 
improves to an FPTAS. Both algorithms are based on approximately solving a linear program 
with an infinite set of constraints, by using an approximate separation oracle. For Lipschitz- 
continuous functions over [0,1], we calculate the precise achievable error as 1 - ^ « 0.134, 
which improves upon the j which is best possible for deterministic algorithms. 

1 Introduction 

One of the most fundamental problems in data-driven sciences is to estimate some aggregate statistic 
of a real- valued function / by sampling / in few places. Frequently, obtaining samples incurs a cost 
in terms of human labor, computation, energy or time. Thus, researchers face an inherent tradeoff 
between the accuracy of estimating the aggregate statistic and the number of samples required. 
With samples a scarce resource, it becomes an important computational problem to determine 
where to sample /, and how to post-process the samples. 

Naturally, there are many mathematical formulations of this estimation problem, depending on 
the aggregate statistic that we wish to estimate (such as the average, median or maximum value), 
the error objective that we wish to minimize (such as worst-case absolute error, average-case squared 
error, etc.), and on the conditions imposed on the function. In this paper, we study algorithms 
optimizing a worst-case error objective, i.e., we assume that / is chosen adversarially. Motivated by 
the applications described below, we use Lipschitz-continuity to impose a "smoothness" condition 
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on /. (Note that without any smoothness conditions on /, we cannot hope to approximate any 
aggregate function in an adversarial setting without learning all function values.) That is, we 
assume that the domain of / is a metric space, and that / is Lipschitz-continuous over its domain. 
Thus, nearby points are guaranteed to have similar function values. 

Here, we focus on perhaps the simplest aggregation function: the average /. Despite its sim- 
plicity, it has many natural applications, such as 

1. In sensor networks covering a geographical area, the average of a natural phenomenon (such 
as temperature or pressure) is frequently one of the most interesting quantities. Here, nearby 
locations tend to yield similar measurements. Since energy is a scarce resource, it is desirable 
to sample only a few of the deployed sensors. 

2. In population surveys, researchers are frequently interested in the average of quantities such 
as income or education level. A metric on the population may be based on job similarity, 
which would have strong predictive value for these quantities. Interviewing a subject is time- 
consuming, and thus sample sizes tend to be much smaller than the entire population. 

3. In numerical analysis, one of the most fundamental problems is numerical integration of a 
function. If the domain is continuous, this corresponds precisely to computing the average. 
If the function to be integrated is costly to evaluate, then again, it is desirable to sample a 
small number of points. 

If / is to be evaluated at k points, chosen deterministically and non-adaptively, then previous 
work [3] shows that the optimum sampling locations for estimating the average of / form a k- 
median of the metric space. However, the problem becomes significantly more complex when the 
algorithm gets to randomize its choices of sampling locations. In fact, even the seemingly trivial 
case oi k = 1 turns out to be highly non-trivial, and is the focus of this paper. Addressing this case 
is an important step toward the ultimate goal of understanding the tradeoffs between the number 
of samples and the estimation error. 

Formally, we thus study the following question: Given a metric space Ai, a randomized sampling 
algorithm is described by (1) a method for sampling a location x € Ai from a distribution p; (2) a 
function g for predicting the average / of the function / over M, using the sample (x, f{x)). The 
expected estimation error is then E{p,g,f) = YIx^mP^ ' Idi^^fi^)) ~ /I- (The sum is replaced 
by an integral, and p by a density, if Ai is continuous.) The worst-case error is Ew{p,g) = 
sup/gL -^(Pi /)i where L is the set of all 1-Lipschitz continuous functions defined on A4. Our 
goal is to find a randomized sampling algorithm (i.e., a distribution p and function g, computable 
in polynomial time) that (approximately) minimizes Ew{-p,g). 

In this paper, we provide a PTAS for this problem of minimizing Ew{p,g), for any discrete 
metric space Ai with constant doubling dimension. (This includes constant-dimensional Euclidean 
metric spaces.) For discrete metric spaces Ai embedded on a line, we improve this result to an 
FPTAS. Both of these algorithms are based on a linear program with infinitely many constraints, 
for which an approximate separation oracle is obtained. 

We next study the perhaps simplest variant of this problem, in which the metric space is the 
interval [0, 1]. While the worst-case error of any deterministic algorithm is obviously j in this case, 

we show that for a randomized algorithm, the bound improves to 1-^. We prove this by providing 
an explicit distribution, and obtaining a matching lower bound using Yao's Minimax Principle. Our 
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result can also be interpreted as showing how "close" a collection of Lipschitz-continuous functions 
on [0, 1] must be. 

1.1 Related Work 

Estimating the integral of a smooth function / using its values at a discrete set of points is one of 
the core problems in numerical analysis. The tradeoffs between the number of samples needed and 
the estimation error bounds have been investigated in detail under the name of Information Based 
Complexity (IBC) |10l lllj . More generally, IBC studies the problem of computing approximations 
to an operator S{f) on functions / from a set F (with certain "smoothness" properties) using a finite 
set of samples N{f) = [Li(/), L2(/), . . . ,Ln(/)]- The Lj are functionals. For a given algorithm [/, 
its error is E{U) = supjg^ ll'S'(/) — U{f)\\. The goal in IBC is to find an e-approximation U (i.e., 
ensuring that E{U) < e) with least information cost c{U) = n. 

One of the common problems in IBC is multivariate integration of real-valued functions with a 
smoothness parameter r over d-dimensional unit balls. For such problems, Bakhvalov [2] designed 
a randomized algorithm providing an e-approximation with cost Q( ^2d/(d+2r) )■ Bakhvalov [2] and 
Novak [9] also show that this cost is asymptotically optimal. The papers by Novak [9] and Mathe [7] 
show that if r = 0, then simple Monte-Carlo integration algorithms (which sample from the uniform 
distribution) have an asymptotically optimal cost of ^. 

In [131 114]. Wozniakowski studied the average case complexity of linear multivariate IBC prob- 
lems, and derived conditions under which the problems are tractable, i.e., have cost polynomial in ^ 
and d. Wojtaszczyk jl2j proved that the multivariate integration problem is not strongly tractable 
(polynomial in ^ and independent of d). 

In [3], Baran et al. study the IBC problem in the univariate integration model for Lipschitz 
continuous functions, and formulate approximation bounds in an adaptive setting. That is, the 
sampling strategy can change adaptively based on the previously sampled values. They provide 
deterministic and randomized e-approximation algorithms, which, for any problem instance P, use 
0(log( ^ Q^prp ) • OPT) samples for the deterministic case and 0(OPT^/^ + OPT • log(^)) samples 
for the randomized case. Here, OPT is the optimal number of samples for the problem instance 
P. They prove that their algorithms are asymptotically optimal, compared to any other adaptive 
algorithm. 

There are two main differences between the results in IBC and our work: first, IBC treats the 
target approximation as given and the number of samples as the quantity to be minimized. Our 
goal is to minimize the expected worst-case error with a fixed number of samples (one). More 
importantly, results in IBC are traditionally asymptotic, ignoring constants. For a single sample, 
this would trivialize the problem: it is implicit in our proofs that sampling at the metric space's 
median is a constant-factor approximation to the best randomized algorithm. 

The deterministic version of our problem was studied previously in [5]. There, it was shown 
that the best sampling locations for reading k values non-adaptively constitute the optimal k- 
median of the metric space. Thus, the algorithm of Arya et al. [1] gives a polynomial-time (3 + e)- 
approximation algorithm to identify the best k values to read. 
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2 Preliminaries 



We are interested in real-valued Lipschitz-continuous functions over metric spaces of constant dou- 
bling dimension (e.g., [6]). Let {M,d) be a compact metric space with distances d{x,y) between 
pairs of points. W.l.o.g., we assume that maXx^yi^M d{x, y) = 1. We require {M.,d) to have constant 
doubling dimension /3, i.e., for every 5, each ball of diameter 6 can be covered by at most balls 
of diameter 6/c, for any c > 2. 

A real-valued function / is Lipschitz-continuous (with constant 1) if |/(x) — f{y)\ < d{x,y) 
for all points x,y. We define L to be the set of all such Lipschitz-continuous functions /, i.e., 
^ — {f \ l/(^) ~ f{y)\ — d{x, y) for all x, y}. Since we will frequently want to bound the function 
values, we also define Lc = {f £ L \ \ J,^ f{x)dx\ < c}. Notice that Lc is a compact set. 

We wish to predict the average f = f{x)dx of all the function values. When M is finite of 
size n, then the average is of course f = ^ ■ /(^) instead. The algorithm first gets to choose a 
single point x according to a (polynomial-time computable) density function p; it then learns the 
value /(x), and may post-process it with a prediction function g{x,f{x)) to produce its estimate 
of the average /. The goal is to minimize the expected estimation error of the average, assuming 
/ is chosen adversarially from L with knowledge of the algorithm, but not its random choices. 
Formally, the goal is to minimize Ew{p,g) = supyg^(/^.px- • 1/ — 9{x, fix))\dx). If Ai is finite, 
then p will be a probability distribution instead of a density, and the error can now be written as 

Ew{p,g) = supf(,L{T,xP^ • \ f - 9{x,f{x))\). 

Formally, we consider an algorithm to be the pair {p,g) of the distribution and prediction 
function. Let A denote the set of all such pairs, and T> the set of all deterministic algorithms, i.e., 
algorithms for which p has all its density on a single point. Our analysis will make heavy use of 
Yao's Minimax Principle |8]. To state it, we define C to be the set of all probability distributions 
over L. We also define the estimation error A (/, A) = J^p^: • [/ — ^(x, /(x))|(ix, where ^ corresponds 
to the pair (p, g). 

Theorem 2.1 (Yao's Minimax Principle [8]) 

supgg£infA6DE/^g [A(/,yl)] = inf^ig^ sup^g^ A(/, A). 

We first show that without loss of generality, we can focus on algorithms whose post-processing 
is just to output the observed value, i.e., algorithms (p,id) with id{x,y) = y, for all x,y. When 
g is the identity function, we simply write A(/, p) = J^Px ' \ f — fix)\dx for the error incurred by 
using the distribution p. 

Theorem 2.2 Let A* = {p*,g*) be the optimum randomized algorithm. Then, for every e > 0, 
there is a randomized algorithm A = (p, id), such that E^^A) < Eu]{A*) + e. 

Proof. Let A^ denote the set of all (randomized) algorithms using the identity function for post- 
processing, i.e., A^ = {A = (p, id) | p is a distribution over M}. 

For the analysis, we are interested in equivalence classes of functions; we say that /, /' are 
equivalent if either (1) there exists a constant c such that /(x) = c + f'{x) for all x, or (2) there is 
a constant c such that /(x) = c — f'{x) for all x. Let / denote the equivalence class of /, i.e., the 
set of all vertical translations of / and its vertical refiection. 
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Given e, we let r be a large enough constant defined below. A distribution q over Lj- is called 
equivalence-uniform if the distribution, restricted to any equivalence class, is uniforrD0- That is, for 
any f £ f with /', / G L^, we have q{f') = q{f)- Let Ur denote the set of all equivalence-uniform 
distributions over Lr- We will show two facts: 

1. If q £ Ur, then for any deterministic algorithm A G D, there is a deterministic algorithm 
A' £ V n which outputs simply the value it sees, such that 

Ef^^[A{f,A')] < Ef^^[A{f,A)]+e/2. (1) 

2. For any distribution q £ C of Lipschitz-continuous functions, there is an equivalence- uniform 
distribution q' € Ur (where r may depend on q) such that 

mfA6i5n^^E/~,[A(/,yl)] < inf^ei^n^, E^^,, [A(/, A)] + e/2. (2) 

Using these two inequalities, and applying Yao's Minimax Theorem twice then completes the 
proof as follows: 

inf supA(/,A) = sup inf Ej^,[A(/,A)] 

n 

< sup inf E/^q[A(/,A)] + e/2 

m 

< sup inf Ej^,[A(/,A)] + e 

q&Ar 

< sup inf E^^g[A(/,^)] +e 
= inf supA(/,A) + e. 

It thus remains to show the two inequalities. We begin with Inequality ([1]). Let x be the point 
at which A samples the function. For any function f € Lr let / be the "flipped" function around 
X, defined by /(y) = 2/(x) — f{y) for all y. Let L be the set of all functions / such that both / 
and / are in Lr- Because / = 2/(x) — / and — /| < | for all x, we obtain that L 5 Lr^i- 

Also, because / E / and q £ Ur, we have q{f) = q{f) whenever f £ L. We thus obtain that 



E^.,[A(/,^)] = / \g{xj{x))-f\q{f)df+ _\g{xj{x))-f\q{f)df 
Jl Jl 

> i l^{\g{x, fix)) - 71 + \g{xJix)) - l\)q{f)df 
= \ l^{\g{x, fix)) - 71 + \g{x, fix)) - l\)q{f)df 

> \jrf-mm. 



^Unfortunately, this definition does not extend to L, since / is not bounded, and a uniform distribution is thus 
not defined. This issue causes the e terms in the theorem. 
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For the first inequality, we dropped the second integral, and used the symmetry of the distribution to 
write the first integral twice and then regroup. The second step used that by definition, /(x) = f{x), 
and the third the inverse triangle inequality. By definition, f{x) lies between / and /; therefore, 
\f ~ f\ = \ f ~ fi^)\ + l/(^) ~ /I) ^iid we can further bound 

Ef^,[A{f,A)] > ^l^\f-f(x)\ + \f{x)-J\qif)df 

\f-f{xMf)df 

> I \f-f{xMf)df-e/2, 



L 



because symmetry of q implies that Prob[/ ^L]<l/r<e/2 (we will set r > 2/e), and Lipschitz 
continuity implies that \ f — f{x)\ < 1 for all x. 

Next, we prove Inequality ([2]). Let q be an arbitrary distribution, and r large enough such 
that Probj^q[/ ^ L^] < e/2. First, we truncate q to a distribution q" over L^: We set q"{f) = 
for all f ^ Lr, and renormalize by setting q"{f) = Prob^. ^ [fgL ] ' ^(•^) f & Lr- Next, we 
define a distribution q over equivalence classes / as q{f) = j jif^j l" {f')df' finally, let q' be defined 
by choosing an equivalence class / according to q, and subsequently choosing a member of / n 
uniformly at random; clearly, q' is equivalence-uniform. Let A € argmin^g-pn^l^ -^/~g' [^(/i ^)] be a 
deterministic algorithm with identity post-processing function minimizing the expected estimation 
error for g'; let x be the point at which A samples the function. 

Since the algorithm always samples at x and outputs f'{x), the estimation error \f'{x) — f'\ is 
the same for all /' G /, because all these /' are simply shifted or mirrored from each other. If used 
instead on the initial distribution q, A has expected estimation error 

[ \f{x)-J\q{f)df < Prob^^J/ ^ L,] + / \fix)-J\q"if)df 

JL JLr 

< e/2 + [ [ \f'ix)-J\qif')df'df 



e/2 + j \f{x)-f\q{f)d} 



e/2+ / / \f'{x)-f'\q'{f')dfdf 
'.f'&f 



e/2+ / \f{x)-f\q'if)df. 

JLr 



The inequality in the first step came from upper-bounding the estimation error outside Lr by 1, 
and using that q"{f) > q{f)- ■ 



3 Discrete Metric Spaces 

In this section, we focus on finite metric spaces, consisting of n points. Thus, instead of integrals 
and densities, we will be considering sums and probability distributions. The characterization of 
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using the identity function for post-processing from Theorem 12.21 holds in this case as well; hence, 
without loss of generality, we assume that all algorithms simply output the value they observe. The 
problem of finding the best probability distribution for a single sample can be expressed as a linear 
program, with variables px for the sampling probabilities at each of the n points x, and a variable 
Z for the estimation error. 

Minimize Z 

subject to (i) Y^xPx =_}; /o\ 

(ii) E.Px-\f-fi^)\<Z for all /eL ^''> 

(iii) < Px < 1 for all points x 

Since this LP (which we refer to as the "exact LP") has infinitely many constraints, our approach 
is to replace the set L in the second constraint with a set Qs- We will choose Qs carefully to ensure 
that it "approximates" L well, and such that the resulting LP below (which we refer to as the 
"discretized LP" ) can be solved efficiently. 

Minimize Z 

subject to (i) ExPa; =_1 /.X 

(ii) ExPx-|/-/(^)l<^ foraU/GQ^ 

(iii) < < 1 for all points x 

To define the notion of approximation formally, let o be a 1-median of the metric space, i.e., a 
point minimizing d{o, x). Let m = ^ d(o, x) be the average distance of all points from o. 

Because we assumed w.l.o.g. that maXx^ysM d{x,y) = 1, at least one point has distance at 
least ^ from o, and therefore, m > The median value m forms a lower bound for randomized 
algorithms in the following sense. 

Lemma 3.1 The worst-case expected error for any randomized algorithm is at least -m, where 
/3 is the doubling dimension of the metric space. 

Proof. Consider any randomized algorithm with probability distribution p; w.l.o.g., the algorithm 
outputs the value it observes. Let R = {x \ ^ < d{x,o) < ^} be the ring of points at distance 
between y and ^ from o. We distinguish two cases: 

1. If ^xeRPx — ^' tl^sn consider the Lipschitz-continuous function defined by f{x) = d{x,o). 
This function has average / = m. With probability at least ^, the algorithm samples a point 
outside R, and thus outputs a value outside the interval which incurs error at least 

y. Thus, the expected error is at least ^. 

2- '^xi^rPx ^ I' then consider a collection of balls Bi, . . . , Bj^ oi diameter y covering all points 
in R. Because R is contained in a ball of diameter 3m, the doubling constraint implies that 
fc < 6^ balls are sufficient. At least one of these balls — say, Bi — has ExgbP^; > Fix 
an arbitrary point y £ Bi, and define the Lipschitz-continuous function / as f{x) = d{x,y). 
Because o was a 1-median, we get that f > m. With probability at least the algorithm 
will choose a point inside Bi and output a value of at most y, thus incurring an error of at 
least Y- Hence, the expected error is at least jj: ' y — 4^ ' ™' * 
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We now formalize our notion for a set of functions Qs to be a good approximation. 



Definition 3.2 ((5-approximating function classes) For any sampling distribution p, define 
Ei{p) = maxf^L A{f,p) and E2{p) = maxjgg^ A(/, p) to be the maximum error of sampling 
according to p against a worst-case function from L and Qs, respectively. The class Qs is said to 
^-approximate L if the following two conditions hold: 

1. For each f £ L, there is a function f £ Qs such that |A(/', p) — A(/, p)| < | • Ei{p), for all 
distributions p. 

2. For each f G Qs, there is a function f'£L such that | A(/', p) — A(/, p)| < | • Ei{p), for all 
distributions p. 

Theorem 3.3 Assume that for every 6, Qs is a class of functions 5-approximating L, such that the 
following problem can be solved in polynomial time (for fixed 6): Given p, find a function f £ Qs 
maximizing A(/, p). 

Then, solving the discretized LP ^ instead of the exact LP (0j gives a PTAS for the problem 
of finding a sampling distribution that minimizes the worst-case expected error. 

Proof. First, an algorithm to find a function f £ Qs maximizing ^^Px " \f ~ f{^)\ gives a 
separation oracle for the discretized LP. Thus, using the Ellipsoid Method (e.g., [5]), an optimal 
solution to the discretized LP can be found in polynomial time, for any fixed 5. 

Let p, q be optimal solutions to the exact and discretized LPs, respectively. Let fi £ L 
maximize ^^Qx ■ \ f - f{x)\ over f £ L, and /2 £ Qs maximize ^^Px ■ \ f - fix)\ over / £ Qs- 
Thus, A(/i,q) = Eiiq) and A(/2,p) = E2{p). 

Now, applying the first property from Definition 13.21 to fi £ L gives us a function f[ £ Qs such 
that |A(/(,q) - Ei{q)\ < |^i(q). Since E2{q) > A(/{,q), we obtain that £^2(q) > -E'i(q)(l - |). 

Similarly, applying the second property from Definition [32] to /2 £ Qs, gives us a function G L 
with |A(/^,p)-£;2(p)| < 1^1 (p). Since Ei{p) > A(/^,p), we have that ^i(p) > ^2(p) - |^i(p), 
or -E'i(p) > Also, by optimality of q in Qs, E2{q) < E2{p). Thus, we obtain that Ei{q) < 

S N 



^<^<Eimpl<E,{p){l + 25). 



2 2 2 



In light of Theorem 13.31 it suffices to exhibit classes of functions 5-approximating L for which 
the corresponding optimization problem can be solved efficiently. We do so for metric spaces of 
bounded doubling dimension and metric spaces that are contained on the line. 



3.1 A PTAS for Arbitrary Metric Spaces 

We first observe that since the error for any translation of a function / is the same as for /, we can 
assume w.l.o.g. that /(o) = for all functions / considered in this section. Thus, in this section, 
we implicitly restrict L to functions with /(o) = 0. 

We next describe a set Qs of functions which (5-approximate L. Roughly, we will discretize 
function values to different multiples of 7, and consider distance scales that are different multiples of 
7. We later set 7 = ^^.^0^^ - We then show in LemmaESlthat Qs has size n^og{2{i+^)/^){2/^f ^ ^O(i) 

for constant 6; the discretized LP can therefore be solved in time 0(poly(n) • n^"^^^^"^"^'''-*/'^^^^/'^^'' ) 
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(using exhaustive search for the separation oracle), and we obtain a PTAS for finding the optimum 
distribution for arbitrary metric spaces. 

We let k = log2 and define a sequence of k rings of exponentially decreasing diameter around 
o, that divide the space into k + 1 regions . . . ,Rk+i- Specifically, Rk+i = {x \ d{x,o) < 2m}, 
and Ri = {x \ 2~* < d(x, o) < 2"^*"^^} for i = 1, . . . ,k. Notice that because m > we have that 
k < logn suffices to obtain a disjoint cover. 

Since the metric space has doubling dimension (3, each region Ri can be covered with at most 
(2/7)'^ balls of diameter 27 • 2~*. Let Bij denote the j^^ ball from the cover of R^; without loss of 
generality, each Bij is non-empty and contained in Ri (otherwise, consider its intersection with Ri 
instead). We call Bij the j^^ grid ball for region i. Thus, the grid balls cover all points, and there 
are at most (2/7)^ • logn grid balls. See Figure [1] for an illustration of this cover. 

For each grid ball Bij, let Ojj G Bij be an arbitrary, but fixed, representative of Bij. The 
exception is that for the grid ball containing o, o must be chosen as the representative. We now 
define the class Qs of functions / as follows: 

1. For each i,j, f{oij) is a multiple of 7 • 2~\ 

2. For all {i, j), {i' , j'), the function values satisfy the following relaxed Lipschitz- condition: 
\f{oi,j) - /(oi' ,,•')! < d{oi,j,Oi,^j,) + 7 ■ (2-^ + 2-^. 

3. All points in Bi^j have the same function value, i.e., f{x) = f{oij) for all x G Bij. 




Figure 1: Covering with grid balls 

We first show that the size of Qs is polynomial in n. 
Lemma 3.4 The size of Qs is at most n^og{2{i+'y)/^){2/^f ^ 
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Proof. Because of the first and second constraint in the definition of Qs, each point Ojj can 

take on at most '^^"'•^'"j.lf"''^ — < — = '^^^^'^^ distinct values. Setting the values for all o,- 

uniquely determines the function /; the relaxed Lipschitz condition will result in some of these 
functions not being in Qs, and thus only decreases the number of possible functions. Because there 
are at most (2/7)^ • logn grid balls, there are at most (2(1 + 7)/7)(2/7)''-iogn ^ „iog{2{i+7)/7){2/7)'5 

functions in Q^. ■ 

We need to prove that Qs approximates L well, by verifying that for each function f G L, there 
is a "close" function in Qs, and vice versa. We first show that for any function satisfying the relaxed 
Lipschitz condition, we can change the function values slightly and obtain a Lipschitz continuous 
function. 

Lemma 3.5 For each x £ Ai, let Sx be some non-negative number. Assume that f satisfies the 
"relaxed Lipschitz condition" \f{x) — f{y)\ < d{x, y) + Sx + Sy for all x, y. Then, there is a Lipschitz 
continuous function f'GL such that \f{x) — f'{x)\ < Sx for all x. 

Proof. We describe an algorithm which runs in iterations ^, and sets the value of one point x 
per iteration. denotes the set of x such that f'{x) has been set. We maintain the following two 
invariants after the iteration: 

1. /' satisfies the Lipschitz condition for all pairs of points in S"^, and \ f'{x) — f{x)\ < Sx for all 
X S Sg. 

2. For every function /" satisfying the previous condition, f'{x) < f"{x) for all x E Sg. 

Initially, this clearly holds for Sq = 0. And clearly, if it holds after iteration n, the function /' 
satisfies the claim of the lemma. 

We now describe iteration I. For each x ^ S^^i, let tx = maxyg5^_^ (/'(y) — d{x,y)). We show 
below that for all x, we have tx < f{x)-\-Sx. Let x ^ Se~i be a point maximizing max(/(3;) — Sx,tx), 
and set f'{x) = max(/(x) — Sx,tx)- It is easy to verify that this definition satisfies both parts of 
the invariant. 

It remains to show that tx < fix) + Sx for all points x ^ Si-i. Assume that tx > f{x) + Sx 
for some point x. Let xi be the point in Si^i for which tx = f'{xi) — d{x,xi). By definition, 
either f'{xi) = f{xi) — Sx-^-, or there is an X2 such that f'{xi) = txi = f'{x2) — d{xi,X2)- In 
this way, we obtain a chain xi, . . . ,Xr such that f'{xi) = /'(xj+i) — (i(x.j,Xj+i) for all i < r, and 
f'{xr) = f{xr) — Sxr- Rearranging as /'(xj+i) — f'{xi) = d{xi,Xi+i), and adding all these equalities 
for i = 1,... ,r gives us that f{xr) — f'{xi) = Sx^ + d{xi,Xij^i). By assumption, we have 

f'{xi) — d{x,xi) = tx > f{x) + Sx- Substituting the previous equality, rearranging, and applying 
the triangle inequality gives us that 

f{Xr)-f{x) > Sx + Sx^+d{x,Xi) + Yj\Zld{xi,Xi+i) > Sx + Sx, + d{x,Xr), 

which contradicts the relaxed Lipschitz condition for the pair x,Xr- ■ 

We now use Lemma 13.51 to obtain, for any given / G Qs, a function f'^L close to /. 

Lemma 3.6 Let f € Qs- There exists an f & L such that |A(/, p) — A(/',p)| < | • ii^i(p), for all 
distributions p. 
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Proof. Because / S Q^, it must satisfy, for all and the relaxed Lipscliitz condition 

\fi^i,j) ~ /(ci'j')l ^ d{oij,Oi'j') + 7 • (2~* + 2~* ). Now, we apply Lemma [331 with Sq^j = 7 • 2~' 
to get function values /'(ojj) for all i,j that satisfy the Lipschitz condition and the condition that 
\f'ioi,j) - /(oj,i)| < 7 • 2"*. For any other point x, let Lmaxix, f) = minjj(/'(oij) + d{x, Oij)) and 
LminixJ) = maxjj(/'(ojj) - d{x,Oij)), and set /'(x) = i • {LmaxixJ) + Lmin{x,f)). It is easy 
to see that L^jiinix, ^ -^^max(^)/) for all x, and that this definition gives a Lipschitz continuous 
function /'. For a point x G -Bjj-, triangle inequality, the above construction, and the fact that Bij 
has diameter at most 27 • 2~* imply that 

If'ix) - fix)\ < \f'{x) - /'(o,,)| + |/'(o,,) - + 1/(0,,) - /(x)| 

< 27-2-*+7-2-* + 
= 37 -2^^ 

For each point x, let l{x) be the index of the region i such that x £ Ri. Now, using the triangle 
inequality and Lemma \37\] we can bound 



n 

X 



^ E 37-d(x,o)+ 37 -m) 

x^Rk+i x6-Rfc+i 

< — • (S'ynm + 3jnm) 
n 

< 24-6^ •7-£;i(p). 



Similarly, we can bound 



< 3j-{m+ X Px-d{x,o)). 



x<^Rh^ 



1 



Let /" be defined as f"{x) = d{x,o). Clearly, /" G L, f" = m, and the estimation error for p 
when the input is /" is 



m. 



Combining these observations, and using Lemma l3.ll and the fact that A(/",p) < Ei{p), we 
get ExPx ■ \f'ix) - fix)\ < 67 • m + 37A(/",p) < (8 • 6f +2) • 37 • i?i(p). 

Now, by using the fact that |A(/,p) - A(/',p)| < \f' - f\ + Y^^p^ ■ \f'{x) - f{x)\, and setting 
7 = -^i:^^^, we obtain the desired bound. ■ 

Finally, we need to analyze the converse direction. 

Lemma 3.7 Let f & L. There exists an f G Qs such that |A(/, p) — A(/',p)| < | • ii^i(p), for all 
distributions p. 
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Proof. The proof is similar to that of Lemma |3.6[ First, for each grid ball representative Oij, we 
let f'{oij) be f{oij), rounded down (up for negative numbers) to the nearest multiple of 7 • 2~*. 
Then, for all points x G Bij, we set f'{x) = f'{oij). Clearly, the resulting function /' is in Qs- 

By a similar argument as before, — f{x)\ < 87 ■ 2~''^^\ for all points x. Thus, we 

immediately get \ f' — f \ < 24 • 6^ • 7 • -E'i(p) as well. 

Define the function /" exactly as in the proof of Lemma 13. 6i Then, exactly the same bounds 
as in that proof apply, and give us the claim. ■ 

3.2 An FPTAS for points on a line 

In this section, we show that if the metric consists of a discrete point set on the line, then the 
general PTAS of the previous section can be improved to an FPTAS. 

Since we assumed the maximum distance to be 1, we can assume w.l.o.g. that the points are 
= xi < a;2 < • • • < Xn = 1. Also, because w.l.o.g. the post-processing is the identity function, we 
only need to consider functions f £ Lq, i.e., such that /(xj) = 0. We define 7 = and the 
class Qs to contain the following functions /: 

1. For each i, f{xi) is a multiple of 7. 

2. The function values satisfy the relaxed Lipschitz-condition |/(xj) — f{xj)\ < d{xi,Xj) + 7 for 
all 

3. The sum is "close to 0", in the sense that f{xi) < 717. 

We first establish that, given a probability distribution p, a function f £ Qs maximizing 
X^iPx'i ■ can be found in polynomial time using Dynamic Programming. To set up the 

recurrence, let o[j, t, s] be the maximum expected error that can be achieved with function values 
at xi,...,Xj, under the constraints that f{xj) = t and f{xi) = s. Then, we obtain the 

recurrence 



a[l,t,s] 



Pxj-t ii s = t 
—00 otherwise 



+ = maxj^g[f_(2,^._^^_2,^.)^f+(2,^._^^_a,^.)]_^|y(p^^,_^ji| + a[j,y,s -t]) 

The maximizing value is then max^gj.^^ ,^^] ,^1^ a[n, t, s]. The total number of entries 

is 0{n ■ and each entry requires time O(^) to compute. The overall running time is thus 

0(n • ^) = 0{^), giving us an FPTAS. 

All we need now is to show that Qs (5-approximates L. We use the following lemma: 

Lemma 3.8 For each f £ L, there is a function f E Qs such that | A(/, p) — A(/', p)| < 27 for all 
distributions p. Also, for each f G Qs, there is a function f'^L such that | A(/, p) — A(/', p)| < 67 
for all distributions p. 

Proof. For the first part, define /' by rounding each /(xj) down (toward for negative values) 
to the nearest multiple of 7. Clearly, /' G Qs- Furthermore, the average changes by at most 7, and 
T.xP=o\f'ix) - /(x)| < ^^Pxl = 7- 

For the second part, first create a Lipschitz continuous function /" from / according to Lemma 
13. 5t then define /'(x) = f"{x) — f" for all x. The first step changed each function value by 
at most 7, and because /" < 7 + / < 27. we have that |/'(x) — /(x)| < 87 for all x. Thus, 

l7-7l + E.Px-l/'(^)-/(^)l <67. ■ 
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By Lemma l3.ll applied with /3 = 1, any randomized algorithm must have expected error at 
least j^. In particular, substituting the definition of 7 = gives us that 67 < | • Ei{p) for all 
distributions p. Thus, Qs approximates L well. 

4 Sampling in the Interval [0, 1] 

In this section, we focus on what is probably the most basic version of the problem: the metric 
space is the interval [0,1]. In this continuous case, we can explicitly characterize the optimum 
sampling distribution and estimation error. It is easy to see (and follows from a more general result 
in [3]) that the best deterministic algorithm samples the function at ^ and outputs the value read. 
The worst-case error of this algorithm is |. We prove that randomization can lead to the following 
improvement. 

Theorem 4.1 An optimal distribution that minimizes the worst-case expected estimation error is 
to sample uniformly from the interval [2 — ^/3,^/3 — 1], This sampling gives a worst-case error of 
1 - ^ 0.134. 

Following the discussion in Section [21 we restrict our analysis w.l.o.g. to functions / € Lq, i.e., 
we assume that f{x)dx = 0. Then, the expected error of a distribution p against input / is 
A(/, p) = fQPx\f{x)\dx. The key part of the proof of Theorem 14.11 is to show that when the 
algorithm samples uniformly over an interval [c, 1 — c] , then with loss of only an arbitrarily small 
e, we can focus on functions consisting of just two line segments. 

Theorem 4.2 For any b, define fb{x) = -\- b"^ — b — \b — x\. If p is uniform over [c, 1 — c] where 
c = 2 — "v/S, then for every e > 0, there exists some b = 5(e) such that for all functions f £ Lq, we 
have A{fb, p) > A(/, p) - e. 

All of Section 14.11 is devoted to the proof of Theorem 14.21 Here, we show how to use Theorem 
14.21 to prove the upper bound from Theorem 14.11 

Let c = 2 — ^/S, so that the algorithm samples uniformly from [c, 1 — c]. Let e > be arbitrary; 
we later let e — t- 0. Let b = 6(e) be the value whose existence is guaranteed by Theorem 14. 2i We 
distinguish two cases: 

1. If 6 < c, then 

A(A,p) 



2. If 6 > c, then 

A(/fe,p) 



1 

1 - 2c 
1 

1 - 2c 
1 

1 - 2c 



l-c 



+ b'' 



\b — xlldx 



ilib'+l-cf+lii-c-br) 
{b'+{\-cn 



1 - 2c 



2 



16 — xlldx 
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^ (26/(6) + 2cb + f{bf -c-b^) 



1 - 2c 



(6* - 6^ + 2c6 + - - c) 



l-2c' 4 

' -(6^-(6-c)2 + (i-c)2). 



l-2c' ' ' '2 

The first formula is increasing in 6, and thus maximized at 6 = c; at 6 = c, the value equals that of 
the second formula, so the maximization must happen for 6 > c. A derivative test shows that it is 
maximized for b = , giving an error of 1 — By Theorem 14.21 foi' ^'^Y function /, the error 

is at most 1 — + e, and letting e — )• now proves an upper bound of 1 — on the error of the 
given distribution. 

Next, we prove optimality of the uniform distribution over [2 — \/3) \/3 — 1], by providing a 
lower bound on all randomized sampling distributions. Again, by Theorem 12.21 we focus only on 
algorithms which output the value f{x) after sampling at x, by incurring an error e > that can 
be made arbitrarily small. Our proof is based on Yao's Minimax principle: we explicitly prescribe 
a distribution q over Lq such that for any deterministic algorithm using the identity function, 
the expected estimation error is at least 1 — Since a deterministic algorithm is characterized 

completely by its sampling location x, this is equivalent to showing that Ej^g [|/(x)|] > 1 — ^ for 
all X. 

We let b = , and define two functions /, /' as f{x) = \ + b'^ -b- \x — b\ and f'{x) = f{l — x). 
The distribution q is then simply to choose each of / and /' with probability ^. Fix a sampling 
location x; by symmetry, we can restrict ourselves to x < ^. Because f = f = 0, the expected 
estimation error is 

^(|/(x)| + |/'(x)|) = ^{\l + b'-b-\x-b\\ + \^ + b'-b-\l-x-b\\) 

^ — b, if X < b 
i - X, if 6 < X < i - 6^ 
62, ifi_62<^<i^ 

This function is clearly non-increasing in x, and thus minimized at x = ^, where its value is 

6^ = 1 — Thus, even at the best sampling location x = ^, the error cannot be less than 1 — 
This completes the proof of Theorem 14.11 ■ 
Notice that the proof of Theorem 14 . 1 1 has an interesting alternative interpretation. For a (finite) 
multiset S C Lq oi Lipschitz continuous functions / with f{x)dx = 0, we say that S is 5-close 
if there exist x,y such that ^ • ^jg^ l/(^) — y| < ^- In other words, the average distance of the 
functions from a carefully chosen reference point is at most 5. Then, the proof of Theorem 14.11 
implies: 

Theorem 4.3 Every set S Lq is {1 — ^)-close, and this is tight. 
4.1 Proof of Theorem 14.21 

We begin with the following lemma which guarantees that we can focus on functions / with finitely 
many zeroes. 
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Lemma 4.4 For any e > and any function f , there exists a function f such that there are at 
most 0(l/e) points x with f'{x) = 0, and A(/',p) > A(/, p) — e, for all distributions p. 

Proof. Let e > be arbitrary. We prove the lemma by modifying / to ensure that it meets the 
requirements, and showing that its estimation error decreases by at most e in the process. 

We replace / with a function /' with the following properties: (1) /' is Lipschitz continuous, 
(2) Jq f{x)dx = f'{x)dx, (3) — f'{x)\ < e for all x, and (4) for each j = 1, . . . , 1/e, the set 

Zj = {x E [{j — l)e,je] | / (x) = 0} contains at most three points. The error can change by at 
most e due to the third condition, and the fourth condition ensures the bound on the number of 
zeroes. 

To describe the construction, first focus on one interval [(j — l)e,je], and define x~ = inf zj, 
x+ = sup Zj , and 5 = x+ — x~ . Now let a = ^ /(^)'^^ ^ g^j^^^ define the function /' such that 

{a — \x~ + a — x\, if X G [x^ , x" + 2a] 

a-6/2 + \x+ + a-6/2-x\, if x G [x- + 2a,x+] 
/(x), ifxG [(i-l)e,je]\[x-,x+]. 

Intuitively, this replaces the function on the interval by a zigzag shape with the same integral that 
has the same leftmost and rightmost zero. 

Do this for each j. By the careful choice of q, the integral remains unchanged. Because each 
function value changes by at most 5 < e, the third condition is satisfied; the fourth condition is 
directly by construction, and Lipschitz continuity is obvious. ■ 

Next, we show a series of lemmas restricting the functions / under consideration. When we say 
that / has a certain property without loss of generality, we mean that changing / to /' with that 
property can be accomplished while ensuring that A(/',p) > A(/, p) for all uniform distributions 
p over intervals [c, 1 — c]. Since our goal is to characterize the functions that make the algorithm's 
error large, this restriction is indeed without loss of generality. 

We focus on points x G (c, 1 — c) with /(x) = 0. Let c < zi < . . . < Zk < 1 — c he all such 
points. For ease of notation, we write zq = c and Zk+i = 1 — c. By continuity, /(x) has the same 
sign for all x G (zj, -Zi+i), for i = 0, . . . , k. We show that w.l.o.g., / is as large as possible over areas 
of the same sign. 

Lemma 4.5 Assume w.l.o.g. that /(x) > for all x G [zi,Zj], with j > i. Then, w.l.o.g., f 
maximizes the area over [zi, zj] subject to the Lipschitz constraint and the function values at Zi and 
Zj. More formally, w.l.o.g., f satisfies, 

1. If 1 < i < j < k, then f[x) = min(x — Zj, Zj — x) for all x G [zi, zj] . 

2. If i = 0, then /(x) = min(/(c) + (x — c),zi — x) for all x G [c, zi], and if i = k, then 
f{x) = min(/(l - c) + (1 - c) - x,x - Zk) for all x G [zk, 1 - c]. 

Proof. We prove the first part here (the proof of the second part is analogous). Define a function 
/' as /'(x) = min(x — Zi,Zj — x) for x G [zj, Zj\, and f'{x) = f{x) otherwise. Let /" = f — /', so 
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that /" is renormalized to have integral 0. Since f'{x) > f{x) for all x, and / = 0, we have that 
f>0. Then 

' ''\f"ix)\-\fix)\dx 

\f"{x)\ - \f{x)\dx + r \f{x) - 71 - \f{^)\dx + f ' - 71 - \f{x)\dx 

i J C J Zj 

> r ilfix) - 71 - |/'(x)|) + (|/'(x)| - \f{x)\)dx - (1 - 2c - {z, - z,))J 

J Zi 

> r - 1/(^)1'^^ - r T'dx -{I -2c- {zj - Zi))f 

J Zi 

fix) - f{x)dx - (1 - 2c)J 



= 2c -f 
> 0. 

Thus, the estimation error of /" is at least as large as the one for /, so w.l.o.g., / satisfies the 
statement of the lemma. ■ 

Lemma 4.6 W.l.o.g., k < 2, i.e., there are at most two points x £ (c, 1 — c) such that f{x) = 0. 

Proof. Assume that f{zi) = f{z2) = /(z-s) = 0. Consider mirroring the function on the interval 
[2:1,2:3]. Formally, we define f'{x) = f{z^ — {x — zi)) if x G [21,23], and f'{x) = f{x) otherwise. 

Clearly, /' is Lipschitz continuous and has the same average and same expected estimation error 
as /. However, the signs of /' on the intervals [c, 21] and [21, 21 + 23 — 22] are now the same; similarly 
for the intervals [21 + 23 — 22, 23] and [23, 1 — c]. Thus, applying Lemma 1^31 . we can further reduce 
the number of points x with f{x) = 0, without decreasing the estimation error. ■ 

Hence, it suffices to focus on functions / that have at most two points 2 G (c, 1 — c) with 
f{z) = 0. We distinguish three cases accordingly: 

1. If there is no point 2 G (c, 1 — c) with f{z) = 0, then /(c) and /(I — c) have the same 
sign; without loss of generality, / is negative over (c, 1 — c). Then, the expected error is 
maximized when f{x)dx and f{x)dx are as positive as possible, subject to the Lipschitz 

condition and the constraint that f{x)dx = 0. Otherwise, we could increase the value of 
Jo f{x)dx and J^_^ f{x)dx, and then lower the function to restore the integral to 0. By doing 
this, the expected estimation error cannot decrease. Thus, by Lemma 14.51 / is of the form 
f{x) = |x - 6| + where b = argmin^g(c i_^) f{x). 

2. If there is exactly one point 2 G (c, 1 — c) with f{z) = 0, then /(c) and /(I — c) have opposite 
signs. Without loss of generality, assume that /(c) > > /(I — c) and that 2 < ^. (Otherwise, 
we could consider f'{x) = /(I — x) instead.) 

The expected error is maximized when /(c) is as large as possible, and ^ f{x)dx is as 
negative as possible, subject to the Lipschitz condition and the constraint that Jg f{x)dx = 0. 
Because 2 < ^ and the integral of the function f'{x) = z — x is thus negative, by starting from 
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then raising the function in the interval [1 — c, 1] and, if necessary, increasing /'(I — c), 
it is always possible to ensure that f{x) = z — x for all x G [0,z]. Then, ^ f{x)dx is as 
negative as possible if, for some value h, f is of the following form: /(x) = —{x — z) for 
X < b, and f{x) = —{b — z) + {x — b) = z + x — 2b foi x > b. Thus, / overall is of the form 
f{x) = \x-b\-{b-z). 

3. If there are two points zi < Z2 £ (c, 1 — c) with f{zi) = f{z2) = 0, then /(c) and /(I — c) 
have the same sign; w.l.o.g., they are both positive. We distinguish two subcases: 

• If Z2 — zi > {1 — c — Z2) + {zi — c), then the expected error is maximized when f^^ f{x)dx 
and J^^ f{x)dx are as positive as possible, subject to the Lipschitz condition and the 
constraint that f{x)dx = 0. Otherwise, we could increase the value of ^ f{x)dx 
and f{x)dx, and then lower the function by some small resulting 6 to restore the 
integral to 0. If the function is thus lowered by §, then for the interval {zi, Z2), the error 
increases by 6, while for the intervals [c, zi] and [^2; 1 — c], it at most decreases by 5. By 
the condition 2:2 — 2:1 > (1 — c — 22) + (-^i — c), the lowering would overall increase the 
error. Now, applying Lemma 14.51 gives us that w.l.o.g., f{x) = \x — ^^^^\ — ^^^^^ ■ 

• If 22 — ^1 < (1 — c — 22) + (21 — c), then the expected error is maximized when f{x)dx 
and J^_^ f{x)dx are as negative as possible, subject to the Lipschitz condition and the 
constraint that f{x)dx = 0. Otherwise, we could decrease the value of f{x)dx 
and f^_^ f{x)dx, and then raise the function to restore the integral to 0. An argument 
just as in the previous case shows that the error cannot decrease. Hence, w.l.o.g., 
f{x) = /(c) — (c — x) for X G [0, c] and f{x) = /(I — c) — (x — (1 — c)) for x G [1 — c, 1]. 
We next claim that there must be at least one point 2 G [0,c) U (1 — c, 1] such that 
f{z) = 0. For contradiction, assume that / is positive in [0, c) U (1 — c, 1]. Then, 
/(O), /(I) > 0, and therefore, /(c), /(I — c) > c. Because /(21) = /(22) = 0, this implies 
that 2i > 2c and 22 < 1 — 2c. But with our choice of c = 2 — \/3, this implies that 
22 < zi, a contradiction. 

Without loss of generality, assume that the interval (1 — c, 1] contains such a point 2; 
define 23 = min{2 G (1 — c, 1] | f{z) = 0}. Further, assume that we have applied 
Lemma 14.51 to /, such that / maximizes the area in the intervals [c, 21], [21, 22] and 
[22, 1 — c]. Consider mirroring the function / on the interval [21, 23]. Formally, we define 
f'{x) = /(23 — (x — 2i)) if X G [21,23], and /'(x) = /(x) otherwise. Clearly, /' is Lipschitz 
continuous and has the same integral (namely, zero) as /. 

Next, we define a new function /" by modifying /' so that it is as negative as possible 
in the interval [21 + 23 — 22, 1]. Formally, we define f"{x) = 21 + 23 — 22 — x if x G 
[21 + 23 — 22, 1], and f"{x) = /'(x) otherwise. (See Figure [2] for an illustration of this 
mirroring, and the resulting shapes of /' and /") 

Notice that /" is not normalized to have an integral of 0, since 

Jz^-^-z3-z2 f'i-^)- However, since (by assumption on the current case)) 22 — 21 < (1 — 
c — 22) + (21 — c), raising /" to restore the integral to can only increase the resulting 
estimation error, by an argument similar to the previous case. The remainder of the 
proof for this case is as follows: We will first prove that the estimation error of /" is at 
least as large as the estimation error of /. This implies that even after normalizing /", 
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Figure 2: Mirroring / in [2:1,23] 



its estimation error remains at least as large as that of /. Finally we can use Lemma 
14.51 on the normalized version of /" to reduce the number of points x G (c, 1 — c) with 
f"{x) = down to either one or zero, without decreasing the estimation error, and thus 
reduce this subcase to one of the previous two cases. 

We now compare the estimation error of / against that of Simply by definition of /", 
we have that Jl;'\f{x)\dx = /^^i+^i-^-^^) \f"{x)\dx, and \f{x)\dx = \f"{x)\dx. 
Furthermore, we have that J^^ \f{x)\dx < J^_^^(^i_^_^^^ \f"{x)\dx. This follows, since for 
any values p,q such that < p < q, we have || — |x — ^\\dx < Jq \p — x\dx. Hence, 

jl-''\nx)\dx>ji~''\f{x)\dx. 

In all three cases, we have thus shown that w.l.o.g., f{x) = \x — b\ — for some values h^t. 
Finally, the normalization f{x)dx = implies that t = ^ + b"^ — b, completing the proof of 
Theorem 14. 2[ 

5 Future Work 

Our work is a first step toward obtaining optimal (as opposed to asymptotically optimal) random- 
ized algorithms for choosing k sample locations to estimate an aggregate quantity of a function /. 
The most obvious extension is to extend our results to the case of estimating the average using k 
samples. It would be interesting whether approximation guarantees for the /c-median problem (the 
deterministic counterpart) can be exceeded using a randomized strategy. 

Also, our precise characterization of the optimal sampling distribution for functions on the [0, 1] 
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interval should be extended to higher-dimensional continuous metric spaces. Another natural direc- 
tion is to consider other aggregation goals, such as predicting the function's maximum, minimum, 
or median. For predicting the maximum from k deterministic samples, a 2-approximation algorithm 
was given in [4], which is is best possible unless P=NP. However, it is not clear if equally good 
approximations can be achieved for the randomized case. For the median, even the deterministic 
case is open. 

On a technical note, it would be interesting whether finding the best sampling distribution for 
the single sample case is NP-hard. While we presented a PTAS in this paper, no hardness result is 
currently known. 
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